Scalable data management for map-reduce-based data-intensive applications: a view for cloud and hybrid infrastructures
نویسندگان
چکیده
As Map-Reduce emerges as a leading programming paradigm for data-intensive computing, today’s frameworks which support it still have substantial shortcomings that limit its potential scalability. In this paper we discuss several directions where there is room for such progress: they concern storage efficiency under aCorresponding author bINRIA Research Center, Rennes – Bretagne Atlantique, Rennes, France cINRIA Research Center, Grenoble Rhône – Alpes, Lyon, France dCNRS/Université Lyon 1, Institut de Biologie et Chimie des Protéines, Lyon, France eENS Cachan – Antenne de Bretagne, Rennes, France fIBM Products and Solutions Support Center, Montpellier, France gJoint INRIA-UIUC Laboratory for Petascale Computing, Urbana-Champaign, USA hArgonne National Laboratory, Argonne, USA iCNRS, CC IN2P3, Lyon, France jIBM Research, Dublin, Ireland Copyright c © 2012 Inderscience Enterprises Ltd.
منابع مشابه
Towards Scalable Data Management for Map-Reduce-based Data-Intensive Applications on Cloud and Hybrid Infrastructures
Data: • Massive, unstructured data objects (Terabytes) • Many data objects (10³-10) ⁶ • High concurrency (10³ concurrent clients) • Fine-grain access (Megabytes) Applications: • Map-Reduce-based data-analysis applications • Governmental and commercial statistics • Data-intensive HPC simulations • Checkpointing for massively parallel computations Platforms:
متن کاملData Replication-Based Scheduling in Cloud Computing Environment
Abstract— High-performance computing and vast storage are two key factors required for executing data-intensive applications. In comparison with traditional distributed systems like data grid, cloud computing provides these factors in a more affordable, scalable and elastic platform. Furthermore, accessing data files is critical for performing such applications. Sometimes accessing data becomes...
متن کاملBlobSeer: Next-generation data management for large scale infrastructures
As data volumes increase at a high speed in more and more application fields of science, engineering, information services, etc., the challenges posed by data-intensive computing gain an increasing importance. The emergence of highly scalable infrastructures, e.g. for cloud computing and for petascale computing and beyond introduces additional issues for which scalable data management becomes a...
متن کاملAn Efficient Secret Sharing-based Storage System for Cloud-based Internet of Things
Internet of things (IoTs) is the newfound information architecture based on the internet that develops interactions between objects and services in a secure and reliable environment. As the availability of many smart devices rises, secure and scalable mass storage systems for aggregate data is required in IoTs applications. In this paper, we propose a new method for storing aggregate data in Io...
متن کاملEnergy Aware Resource Management of Cloud Data Centers
Cloud Computing, the long-held dream of computing as a utility, has the potential to transform a large part of the IT industry, making software even more attractive as a service and shaping the way IT hardware is designed and purchased. Virtualization technology forms a key concept for new cloud computing architectures. The data centers are used to provide cloud services burdening a significant...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IJCC
دوره 2 شماره
صفحات -
تاریخ انتشار 2013